Maternal Mortality Ratio
The definition of maternal mortality is:
“A death of a woman while pregnant or within 42 days of termination of pregnancy, irrespective of the duration and site of pregnancy, from any cause related or aggravated by the pregnancy or its management, but not from accidental or incidental causes.”
The definition of maternal mortality ratio is:
“The number of maternal deaths during a given time period per 100,000 live births during the same time period.”
In this notebook we will create a long-term time-series of maternal mortality ratio. To do this we will combine two existing datasets:
- GapMinder (1751-2008)
- World Health Organization (2000-2017)
We make the assumption that their methods are comparable and that the data can be combined without transformation.
In years where there is an overlap in both time-series we use the data from the WHO.
Downloading the GapMinder data
Firstly we download the data from GapMinder and save it locally. Then we read in the GapMinder data, clean up the column names and take a look at the data.
Code
dir.create("FionaSpooner/maternal_mortality/data/input", recursive = TRUE)
download.file('https://www.gapminder.org/documentation/documentation/gapdata010.xls', paste0('FionaSpooner/maternal_mortality/data/input/',date,'_gapminder_data.xls'))
df <- read_xls(paste0('data/input/',date,'_gapminder_data.xls'), col_names = TRUE)
df <- df[-(1:16),]
df <- janitor::clean_names(df)
head(df)# A tibble: 6 × 13
country year mmr live_births maternal_deaths women_reproduct… mm_rate x8
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <lgl>
1 Austra… 1871… 643 <NA> <NA> NA NA NA
2 Austra… 1881… 592 <NA> <NA> NA NA NA
3 Austra… 1891… 560 <NA> <NA> NA NA NA
4 Austra… 1901… 609 <NA> <NA> 940332 0 NA
5 Austra… 1906 538 107890 <NA> NA NA NA
6 Austra… 1907 556.… 110347 614 NA NA NA
# … with 5 more variables: skilled_attendance_percent <dbl>,
# source_mmr_maternal_death_and_live_birth <chr>, comment <chr>,
# source_women_reproductive_age <chr>, comments <lgl>
Cleaning the GapMinder data
There are some issues with the GapMinder data - we will manually clean them here.
The first three rows in the ‘year’ column seem to be wrong as it seems as if each row covers 110 years, but in another source they cover only 10 years.
I will replace them manually here with the middle of each decade, as is used in the other data from the same source.
There are cases where the year for Finland has been entered incorrectly three times, there are two 1772s, 1775s & 1967s, the second of each should be 1872, 1875 and 1957 respectively.
There are also two errors for New Zealand, and one error for both Sweden and the US.
Sri Lanka’s maternal mortality rate seems to drop to zero in 1990, here we set it to NA instead.
Code
df$year[1:3] <- c(1875, 1885,1895)
df['year'][df$year == '1772' & df$country == 'Finland' & df$maternal_deaths == 487,] <- '1872'
df['year'][df$year == '1775' & df$country == 'Finland' & df$maternal_deaths == 629,] <- '1875'
df['year'][df$year == '1967' & df$country == 'Finland' & df$maternal_deaths == 77,] <- '1957'
df['year'][df$year == '1967' & df$country == 'Sweden' & df$maternal_deaths == 39,] <- '1957'
df['year'][df$year == '1967' & df$country == 'United States' & df$maternal_deaths == 1766.28,] <- '1957'
df['year'][df$year =='1989-02'& df$country == 'New Zealand',] <- '1889-02'
df <- df[-which(df$year == '1950' & df$country == 'New Zealand' & df$mmr == 90),]
df['mmr'][df$country == "Sri Lanka" & df$year == 1990,] <- NAFinding the mid-point of each year range
For some rows a range of years is given for a particular maternal mortality rate. We want to find the mid-point of that range. We use this function to find the mid-value of years given.
Code
mid_year <- function(year) {
if(grepl("-", year)){
year_split <- unlist(strsplit(year, split = "-"))
year_begin <- year_split[1]
year_end_len <- nchar(year_split[2])
year_end_pref <- 4 - year_end_len
if (year_end_pref > 0) {
year_pref <- substr(year_begin, 1, year_end_pref)
year_end <- as.numeric(paste(c(year_pref, year_split[2]),collapse = ""))
year_out <- round(mean(c(as.numeric(year_begin), year_end)))
}else{
year_out <- round(mean(as.numeric(c(year_begin,year_split[2]))))
}
}else{
year_out <- year
}
return(year_out)
}
year_out_all <- unlist(lapply(df$year, mid_year))
df$mid_year <- as.numeric(year_out_all)Let’s check the mid-year estimates have worked as expected. There are three cases where the mid-year estimate is not the mid-point of the range of years. Let’s fix these manually. We will keep only the columns we want to take forward.
Code
df %>%
select(year, mid_year) %>%
filter(grepl("-",year)) %>%
mutate(year_begin = substr(year, 1, 4)) %>%
filter(year_begin >= mid_year)# A tibble: 3 × 3
year mid_year year_begin
<chr> <dbl> <chr>
1 1897-01 1849 1897
2 1889-02 1846 1889
3 1899-03 1851 1899
Code
df['mid_year'][df$year == "1897-01",] <- 1899
df['mid_year'][df$year == "1889-02",] <- 1895
df['mid_year'][df$year == "1889-03",] <- 1896
df <- df %>%
select(entity = country, year = mid_year, maternal_mortality_rate = mmr)Reading in the WHO data
Read in data from WDI - in future iterations we will read this in from ETL so it is auto updated. The WDI sources this variable from the WHO so we will henceforth refer to it as the WHO data.
Code
download.file('https://api.worldbank.org/v2/en/indicator/SH.STA.MMRT?downloadformat=csv', paste0('data/input/',date,'_wdi_maternal_mortality.zip'))
unzip(paste0('data/input/',date,'_wdi_maternal_mortality.zip'), exdir = "data/input")
des_file <- unzip(paste0('data/input/',date,'_wdi_maternal_mortality.zip'), list=TRUE)
file_name <- des_file$Name[startsWith( des_file$Name,"API_SH.STA.MMRT_DS2")]
who <- clean_names(read.csv(paste0('data/input/', file_name), skip = 4))Cleaning the WHO data
Pivot and clean the WHO data to long format. Check the GapMinder countries are in the WHO dataset
Code
who_clean <- who %>%
select(country_name, starts_with("X")) %>%
select(-x) %>%
pivot_longer(starts_with("x")) %>%
mutate(year = as.numeric(gsub("x", "", name))) %>%
select(entity = country_name, year, maternal_mortality_rate = value)
head(who_clean)# A tibble: 6 × 3
entity year maternal_mortality_rate
<chr> <dbl> <int>
1 Aruba 1960 NA
2 Aruba 1961 NA
3 Aruba 1962 NA
4 Aruba 1963 NA
5 Aruba 1964 NA
6 Aruba 1965 NA
Code
all(unique(df$entity) %in% who_clean$entity)[1] TRUE
Merging the WHO data with the GapMinder data
Combine the two datasets and add a source column so we can keep track of where each datapoint comes from.
Code
df$source <- 'GapMinder'
who_clean$source <- 'WHO_GHO'
df_all <- rbind(df, who_clean)
df_all$maternal_mortality_rate <- as.numeric(df_all$maternal_mortality_rate)
head(df_all)# A tibble: 6 × 4
entity year maternal_mortality_rate source
<chr> <dbl> <dbl> <chr>
1 Australia 1875 643 GapMinder
2 Australia 1885 592 GapMinder
3 Australia 1895 560 GapMinder
4 Australia 1903 609 GapMinder
5 Australia 1906 538 GapMinder
6 Australia 1907 556. GapMinder
Dealing with duplicate datapoints
Dealing with duplicate years - there are two identical values for NZ in 1950, we can drop one with distinct(). We remove NAs where there is a non-NA value in either Gapminder or WHO. If there are values for both GapMinder and the WHO data we have a preference for the WHO data.
Code
df_check <-df_all %>%
group_by(entity, year) %>%
add_count() %>%
ungroup()
df_sel <- df_check[-which(df_check$n >1 & is.na(df_check$maternal_mortality_rate)),]
df_sel <-df_sel %>%
group_by(entity, year) %>%
add_count() %>%
ungroup() %>%
distinct()
df_sel <- df_sel[-which(df_sel$nn >1 & df_sel$source == 'GapMinder'),]
df_check <- df_sel %>%
group_by(entity, year) %>%
add_count()
sum(df_check$nnn > 1) == 0
df_sel <- df_sel %>% select(-c(n, nn))Standardising the country names
Standardising the country names for use on OWID.
Code
stand_countries <- read.csv("data/input/countries_to_standardise_country_standardized.csv")
df_fin <- df_sel %>% left_join(.,stand_countries, by = c("entity" = "Country")) %>%
mutate(entity = Our.World.In.Data.Name) %>%
select(-c(Our.World.In.Data.Name, source))
df_fin$maternal_mortality_rate[is.na(df_fin$maternal_mortality_rate)] <- ""Cleaning geographic entities
We don’t want to include all of the available regions in the data as it is difficult for us to define them clearly, so we drop out a selection here. We also plot the data for Sweden to check the time-series looks as expected.
Code
entities_to_drop <- c("Africa Eastern and Southern", "Africa Western and Central", "Arab World", "Early-demographic dividend", "East Asia & Pacific (excluding high income)", "East Asia & Pacific (IDA & IBRD countries)", "Euro area", "Europe & Central Asia (excluding high income)", "Europe & Central Asia (IDA & IBRD countries)", "Fragile and conflict affected situations", "Heavily indebted poor countries (HIPC)", "IBRD only", "IDA & IBRD total", "IDA blend", "IDA only", "IDA total", "Late-demographic dividend", "Latin America & Caribbean (excluding high income)", "Latin America & the Caribbean (IDA & IBRD countries)", "Least developed countries: UN classification", "Middle East & North Africa (excluding high income)",
"Middle East & North Africa (IDA & IBRD countries)", "Not classified", "Other small states", "Pacific island small states", "Post-demographic dividend", "Pre-demographic dividend", "Small states", "South Asia (IDA & IBRD)", "Sub-Saharan Africa (excluding high income)", "Sub-Saharan Africa (IDA & IBRD countries)")
df_fin <- df_fin %>%
filter(!entity %in% entities_to_drop)
fig <- plot_ly(data = df_fin %>% filter(entity== "Sweden"), x = ~year,
text = ~paste("MMR: ", round(as.numeric(maternal_mortality_rate),2), '<br>Year:', year)) %>% add_trace(y = ~as.numeric(maternal_mortality_rate), name = 'MMR',mode = 'lines') %>% layout(xaxis = list(title = 'Year'),
yaxis = list(title = 'Maternal Mortality Ratio (per 100,000 live births)'))
figCalculating the proportional change since 2000
We can also calculate the proportional change since 2000. This is useful for understanding the change in the data over time and we can use this data in a slope chart.
Code
df_clean <- df_fin %>%
filter(year %in% c(2000, 2017)) %>%
filter(complete.cases(.)) %>%
mutate(maternal_mortality_rate = as.numeric(maternal_mortality_rate)) %>%
group_by(entity) %>%
add_count() %>%
filter(n == 2)
df_wide <- df_clean %>%
pivot_wider(., names_from = 'year',names_glue = "{.value}_{year}", values_from = 'maternal_mortality_rate') %>%
mutate(change = ((maternal_mortality_rate_2017 - maternal_mortality_rate_2000)/maternal_mortality_rate_2000)*100, baseline = 0) %>%
select(entity, change, baseline) %>%
pivot_longer(cols = c("change", "baseline")) %>%
mutate(name = gsub("change", 2017, name), name = gsub("baseline", 2000, name)) %>%
mutate(year = as.numeric(name)) %>%
select(entity, year, relative_change_mmr = value)
slope_ggplot <- df_wide %>%
highlight_key(., ~entity) %>%
ggplot( aes(x = year, y = round(relative_change_mmr,2), group = entity, text = entity))+
labs(x = "Year", y = "Relative change since 2000 (%)", title = "Relative change in maternal mortality ratio since 2000 (%)")+
geom_line(color = "#888888" )+
theme_bw()+
theme(legend.position="none",
panel.background = element_rect(fill = "#ffffff"),
axis.ticks = element_blank(),
axis.text = element_text(color = "#888888"))
s <- attrs_selected(
line = list(color = "#0000FF",
width = 5)
)
ggplotly(tooltip = "text", slope_ggplot)%>%
config(displayModeBar = FALSE) %>%
highlight(on = "plotly_hover",
opacityDim = 0.7,
color = "blue",
selected = s) %>%
layout(hoverlabel = list(bgcolor = "blue", font = list(color = "FFFFFF", size = 14)))Lastly, we combine the proportional change data with the full time-series of maternal mortality rate and write out the data to a csv file.
Code
df_out <- df_wide %>% right_join(., df_fin, by = c("entity","year"))
df_out$relative_change_mmr[is.na(df_out$relative_change_mmr)] <- ""
write.csv(df_out, "data/output/maternal_mortality_rate_upload.csv", row.names = FALSE)